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ABSTRACT 

Three kinds of instruct ionally sensitive achievement tests are 
described which provide useful information on the proficiencies 
addressed by formal schooling: placement, progress, and attainment 
tests. Procedures to design, develop, and empirically verify such 
tests are presented. 




DEVELOPMENT AND VERIFICATION OF INSTRUCT I ONALLY SENSITIVE 
ACHIEVEMENT TESTS 



Ralph A. Hanson, George E. Behr, Barbara T. Meguro, and Jerry D. 
Bailey 



From the early 1930's (e.g., Tyler, 193A) to contemporary times 
It has been regularly acknowledged that the standard technology for 
developing achievement tests yields measures that are insensitive 
tools for measuring Instruct lonal program effects (Tyler, 1972; 
Buros, 1977; Hanson, Schutz & Bailey, 1980;' Madaus, Alrasian & 
Kelleghan, 1980). Insensitive in th\i context means they are 
inadequate for Identifying Instructional effects and exemplary, 
school ing practices (Hanson S Schutz, 1^28^. However, such 
instruments continue to be developed "and used at least In part 
because there are seemingly "no alternatives' 1 (Buros, 1978). 

A methodology for providing I nstruct lonal 1y sensitive tests 
that has been formulated, tested, and replicated in practice is 
presented In this report. It entails three kinds of achievement 
tests, each of which has clearly defined information functions In 
connection with an Instructional product system. 



Context 

The report focuses on the method for developing Instruments 
rather than on the broader methodological context within which the 
instrumentation technology was derived and verified. Background on 
this broader methodological context may be found elsewhere (ftenson £ 
Scouts, 1978; Hanson, Bailey & Molina, 1980). However, it Is 
relevant to note that this context Is termed programmatic ..„.-■ 
educational R&D and has been nurtured over the past decade and a 
half in various forms by. Regional Educational Laboratories and R&D 
Centers. 

One of the Important outcomes yielded by work In the 
Laboratories and Centers during the late 1960 1 s and early 1970 1 s was 
the development and implementation of instructional product systems. 
These new product systems appear at first look to be simply "more 
Instructional materials." However, they differ In a number of ways 
from conventional Instructional materials. For the most part, these 
differences are In degree rather than kind, which make t;hem 
unobtrusive. For example, the design specifications, wlilch are the 
blueprints for research-based instructional product systems, are 
derived from careful analytical and empirical Inquiry rather than 
tradition, the "consensus 11 of curriculum experts • • . etc. Similar 
differences can be found In the way actual instructional materials 
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are prepared and tested and the way personnel training and 
installation components are developed. The latter components 
provide direct support for school efforts to use the product system. 

Programmetfrc RSD efforts at SWRL have contributed several 
comprehensive product systems for instructional use in schools, 
permitting a reliability of schooling effects not previously 
available (e.g., ... Hanson & Schutz, 1978, Hanson, Bailey & Molina, 
1980; Hanson, Schutz $ Bailey, 1980). Reliability of effects simply 
means that when these product systems are used in schools under 
usual conditions, defined instructional outcomes are attained with 
less variance and higher repl icabi 1 ity than with other forms of . 
Instruction. Furthermore,- the variance observed in effects can be 
linked directly tp the operational practices employed in the. use of 
the product systems. 

While such product systems have obvious value to schools and to 
educational practice in general, they also provide the basis for a 
new kind of research effort. This research effort centers around 
the use of product systems as the instrumentation system for 
studying major educational issues. One such issue is achievement 
testing in schools, and the methodology described here was derived 
from single- and multi-year inquiries pertinent to this issue using 
various product systems and conducted with the cooperation of many 
school districts across the country. 

• • \ 
INSTRUCTIONAL ACHIEVEMENT TESTS 

In this section of the paper, the characteristics and specific 
functions of the Instruments which are yielded by the method are 
described. Subsequent sections describe procedures for constructing 
and verifying these instruments. 

Three specific kinds of tests are treated here as necessary and 
sufficient for describing achievement in connection with an 
Instructional product system. These are referred to as placement, 
progress and attainment tests, and together they constitute the 
measurement elements of an instructlonally sensitive Instrumentation 
system. Descriptive characteristics of each kind of test, are given 
in Table 1. 



Placement Tests 

Placement tests provide information that is used to guide the 
instructional assignment of students prior to Involvement of a 
student in a given Instructional program. This Information can be 
used to help select students who can benefit from the instruction 
and to Identify a segment of the product system where the student 
might best begin work. Another use of placement test Information is 
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Table 1: Characteristics of Tests Forming an Instructional 1y Sensitive 
Instrumentation System 



Test 



Placement 



Instructional 
Unit 

Referenced 



A set of 
related 
Instructional 
segments 



Instructional 
Time (Hrs.) 
Refe renced 



School ing 
Boundaries 



Score 
Referents 



Expected 
Results 



Typical 
Reporting 



60 - 120 
(2-4 seg- 
ments) 



1-6 years 



single 
segment 



consistent 
with 

structural 
relation- 
ship 
between 
segments 



needs instruc- 
tion/does not 
need instruc- 
tion 



When 
Given 



before 
Instruc- 
tion 
begins 



Progress 



A topic or 
unit of a 
segment 



5 - 10 



1-6 weeks 



unit or 
topics in 
a unit 
within a 
segment 



high 



proficient/ 
non- profi- 
cient 



during 
instruc- 
tion 



Attai nment 



A single 
segment 



30 - 50 



1-6 months 



f ul 1 seg- 
ment or 
outcome 
areas of 
a segment 



high 



cont inuous 

(percent 

scores) 



after 
comple- 
tion of 
a 

segment 
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to provide a description of the skills/concepts of a student or of 
student groups in a given instructional program. Such descriptions 
can be useful as baseline Information for evaluating program effects 
(Hanson, 15*80). 

Selection, placement, and baseline information have 
conventionally been derived from standardized achievement test 
scores and teacher Judgments. However, evidence gathered through 
product system exercises shows that such "lal sse-fal re" approaches 
to pupi 1 .selection/placement can result in significant losses in 
school effectiveness, especially due to underplacement of students 
(e.g., Hanson, Bailey, S Molina, 1980; Bohr & Hanson, 1977). 



Progress Tests 

Progress tests serve to provide Information on a student's 
learning status during the course of Instruction. Such tests are 
used at frequent intervals (often dally or weekly, and typically 
within monthly Intervals). The Information provided serves as the 
basis for the immediate assignment of instruction. Also, It 
provides a timely Indicator of student progress In terms of lessons 
completed. Aggregates of this information for classes and schools 
yield fine-grain Information on rate and amount of Instruction 
completed and as such serve as markers of product system 
Implementation. Other than In aggregated form, progress test 
information has little value for audiences outside the classroom 
since it addresses instructional management rather than pupil 
attainment. 

\ 

\ 

Inquiries carried out using product systems have verified these 
points and provided some Insights Into 'Issues surrounding progress 
tests. One specific finding is that progress tests need not be 
referenced to a single "objective," a procedure which until recently 
had been widely advocated (Popham, 1975| Wolf, 1979). Put another 
way, the frequency and precision of progress test information 
suitable for self-instructional programs Is far greater than the 
information function such tests can reasonably perform for 
instruction in conventional classroom settings (Follettie, 1980). 

Another related finding Is that the methods used to obtain 
progress test Information can often be Integrated Into Instructional 
activities making them virtually unobtrusive. While formal progress 
tests may serve well In the context or the typical 
self-Instructional sequence, they are not universally appropriate 
for classroom-centered Instruction, Where students are performing 
Instructional tasks on a regular basis, formal extrinsic progress 
tests are unnecessary and undesirable. 



Attainment Tests 

Attainment tests serve several functions. One function is to 
acknowledge that notable student , learning has (or has not) occurred 
In the instructional program. This achleyement is reflected in the 
proficiency displayed on attainment tests. Alternately stated, the 
acknowledgment function clearly and concretely describes what 
students do and can be expected to learn in instruction that entails 
the product; an attainment test is the operat tonal izat ion of the 
direct effects of an instructional program. " ( 

A related function is to serve as an "output 11 measure for 
program "evaluat ion" and communications. For the communication 
purpose, attainment test proficiencies are usually aggregated by 
classes or schools. They are then used both to describe overall 
(aggregate) effectiveness and as a dependent variable for research 
aimed at identifying the factors contributing to attainment (e.g. 
Hanson & Schutz 1980; HanJon, Bailey 6 Molina 1980) 'The research 
information when properly /assembled can serve as an operational 
basis for instructional planning (Hanson, 1978). 

These functions of attainment tests are not fulfilled by tests 
typically used in school settings. Standardized achievement tests 
do provide indicators of general learning with 1 1 ttle relationship 
to either Instruction received or product system effects (Hanson, 
Schutz & Bailey, 1980; Madaus et al., 1980). Teachers or district 
R&D staff sometimes provide a form of attainment test referenced to 
"instructional objectives." These Instruments usually do not 
provide adequate information about Instructional effects from either 
a descriptive or planning perspective. Publishers and other 
suppliers of instructional products also provide tests. However, 
the Instruments often turn out to be progress tests rather than 
attainment tests and thus are not able to fulfill the descriptive 
and planning Information functions of attainment tests. 

i 

\ C hronological Test Development Schedule 

• The three kinds of tests reference related aspects of an 
i instructional program and therefore are interdependent in design and 
: use. In real-time operational use with a product system, placement 
! tests come first, progress tests second, and attainment tests last, 
i However, this is not the optimum chronology for design/development 
activities. In generating the tests, the progress tests emerge 
first as the development of instructional segments of the product 
system is completed. The progress tests operat lonal Ize the outcomes 
I (I.e., skills/Information) being taught In the specific activities 
I to which It refers. They should not included! ther outcomes taught 
, earlier or outcomes taught In a different forft than presented In the 
.Instruction referenced. ' \ 
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The second test development effort focuses on attainment tests 
and can reasonably begin only after development for at least one 
segment of a product system has been completed. Operationally this 
usually means that all progress tests (or prototypes of them) would 
by this time be available for the segment. With this chronology, 
the Instructional specifications prepared for the attainment tests 
can serve as Important analysis/verification for the Instructional 
design/development effort. To ful 1y complete the construction of 
attainment tests for a product system, It Is necessary to have 
available all Instructional segments and accompanying progress 
tests. The separate nstructlonal specifications and accompanying 
test specifications c^n then be checked for consistency and overlap 
before proceeding further. 

The development of the placement test must await the 
development of all attainment tests since It requires the use of 
both the attainment test specifications and the empirical 
verification data on them. The placement test Isjprepared by 
selecting Items from t Incompleted attainment tests using both data 
and specifications. Individual Items which best differentiate 
pupils completing one segment from those completing the next segment 
are selected. 

\ j 

Reasonable proeeduVes, for designing and developing progress 
tests are available in the I Iterature of test construction and 
self 1 - instructional technology and so require no additional 
elaboration here. In the following sections, specific procedures 
for developing and empirically verifying attainment and placement 
tests are presented and discussed. It Is assumed that both the 
Instructional materials and progress tests for the product system 
are completed and available. 

ATTAINMENT TEST DEVELOPMENT 

The process of preparing attainment tests typically takes place 
in three phases; Instructional specifications, test specifications, 
and test verification. A brief description of the major tasks In 
each phase Is given In Table 2. \ i 



Instructional Specifications 

There are two major tasks In this phase. The first is to 
structure the segments of instruction to be dealt with. As 
1 Indicated In Table 1, It Is recommended that each attainment test Is 
designed to measure a segment of 10 to 50 hours of Instruction. 
This segmentation pattern Is based on \severa1 considerations. Given 
current educational practices, it corresponds roughly to a quarter 
or a semester of instruction In a subject area for a class. More 
importantly, It approximates the minimal amount of instructional 




TABLE 2 



DESCRIPTION OF PHASES IN PRODUCING 
INSTRUCTIONAL ATTAINMENT TESTS 

/ 





Phase 


Major Tasks 


Product 


/ 



\ 

Instructional 
Ana/lysis 



1. Specify the instructional 
segment to be assessed 

2. List the ski lls/qoncepts 
taught. For each: 

a. List or define the 
elements practiced 
List the practice 
format 

Determine the amount 
of practice 



b. 



c. 



Description of t\ 
ski 1 1s/concepts 
taught by instruc- 
tional segment 



Test 

Construction 



\ 



1. Determine the skills and 
concepts assessed 

2. Designate the item format 
for each skill/concept 

3. Specify boundaries for 
each ski 11 /concept 

k. Specify the item sampling 
plan based instructional 
emphas I s 



\ 



\ 



Preliminary test 
specifications and 
prototype tests 



Test 

Verification 



1. Generate prototype 
item sets using test 
specifications 

2. Distribute to users 

3. Score tests and 
analysis of results 

4. identify actual test 
items 



Final test speclff 
cations and final 
tests 
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time for educational effects to occur that have meaning for 
audiences outside the classroom (see e.g., Tyler, 193M, which is 
the prime audience for attainment tests. 

The second major task is to analyze each instructional segment 
to Identify the skills/concepts presented and the amount of direct 
Instruction provided on each. During this analysis, several aspects 
of each Instructional element (I.e., skill/concept) should be noted. 
These are conveniently described and illustrated via an example of 
such specifications. Sample instructional segment specifications 
are given In Table 3 for one segment (Bloc'" fi J of the SWRL/GInn 
Reading Program. The attainment test was t- xtured to provide 
separate scores on two outcomes areas entitled Word and 

Sentence Meaning and Paragraph and Text Interpretation. 

1. Format designation * For each element (ski 1 1 /concept) 
taught in a segment, the specific characteristics of the 
way it is practiced during instruction are noted. Thus, 
For the Word and Sentence Meaning outcome areas described 
In Table 3, students learn specific words, using sentences 
with a multiple choice format, with an average syntax value 
of 256, that have no new words In the. stem, and with new 
words used in the foils. 

For th« second outcome in Table 3, Paragraph and Text 
Interpretation, these same specifications apply plus others 
associated with the various types of question. As the note 
indicates, examples of each question type are included in 
the actual specification. Here Just the type of question 
is listed. 

2. Element designation . The elements referenced ♦ a specific 
format are to be listed and described. Thus, i ^r the Word 
and Sentence outcome area In Table 3, this set Is defined 
by a list of words that could be the object of a question. 
For Paragraph and Text Interpretation the elements are 
paragraphs used in instruction, defined in terms of length, 
range of acceptable syntax complexity, and specific 
vocabulary. 

3. Identification of subsets of elements . The amount of 
practice provided for each cluster of elements Is 
determined by counts of the frequency of practice. Using , 
this Information, categories corresponding to various J, 
different levels of element emphasis with In a segment can % 
be ascertained. Eventually, Interest wl 1 1 center on those 
elements emphasized sufficiently to be cons I dered^taught to 
most students. These elements (or a sample of them) wlir 
eventually be Included on the prototype test forms. 
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TABLE, 3: SAMPLE INSTRUCTIONAL SPECIFICATIONS 
SWRL/Ginn Reading Program - Block 8 



Ski 1 1 or Concept 


' / Format 


Stimulus 
Characteristics 

i ■ ! 


Elements Practiced 

i 


Frequency 


r 

Word and \ / 

Sentence 

Meaning 


, Sentence 
Completion ; 
Multiple 
Choice 

■ 1 .1 


Average ^.ength of 
Sentence: 7 words 

Average Syntax 
Value: 2 2.56 

Average Number of 
Other New Program 
Words in Stem: 0 


Storybook Words: 
Pract i ced both in 
stories and in 
workbook activities 


268 words 
are ^ 
taught 


Non-Storybook Words: 
* Practiced only in 
workbook activities 


209 words 
are ^ 
taught 


Paragraph* and 
Text Interpre- 
tation 


A Passage 
Followed by 
Multiple- 
Choice 
Questions 


Average Syntax 
Value: 2 2.58 ! 

/ 1 


Literal Questions 3 


66 items * 


Concept identifica- 
tion in the question* 


. 5 items 


Concept identifica- t 
tion in the answer' 5 


\k items 


Title/Main Idea 3 


8 items 


3 

Purpose 


10 items 



'Information in this table Is t^ken from Final Block Assessments for Elementary 
CSP , a deliverable under Task 1.5.2 jof N.I.E. Contract No. NE-C-00-3-006^, SWRL 
Educational Research and Development, Los Alamltos, CA, May, 1977. 



Botel, M. and Granowsky, A. A formula for measuring syntactic complexity: 
A directional effort. Elementary English , 1972, ^9 (April), 513-516. 

^Definitions and examples are included in the complete specifications but 
are not reprinted here. 

L 

Exact word lists are included in the complete specifications but are not 
reprinted here. 
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Test Specifications ' . 

Once the instructional analysis has been completed for each 
segment of the product system, the process shifts to the second 
phase, test specifications. The Intent here Is to specify the 
characteristics of those subsets of the instructional element 
clusters that would be expected to be learned by all pupils 
completing the segment, i.e., to eliminate elements that did not 
receive enough attention in instruction to be learned. Each clyster 
so identified must then be represented accurately and / 
proportionately as part of the test specifications. The following 
activties need to be carried out: . / 

1 . Identify anticipatory skills and concepts In segments. 
These are the elements that are subsumed in segment In 
anticipation of learning in subsequent segments and should 
not be included in test specifications. Since the purpose 

. of the test Is to describe the instructional attainment of 
students, classes, and schools there is no reason to assess 
anything but direct effects of instruction, I.e., jfchose 
skills and concept that would be learned upon the 
completion of an instructional segment and represented in 
their most highly developed form. 

2. Identify patterns of Instruction emphasis across segments . 
Sk 1 1 IT and concepts that are addressed In more than one • 
instructional segment need to be identified during 
formulation of the test specifications. This is why the 
instructional specifications (phase 1) for all segments are 
needed before phase 2 can be completed. Depending on the 
Instructional format and organization within segments, a 
given element may be taught definitively (I.e., to mastery 
some would say) In one segment; may be taught In part in 
several segments; or may never be taught to proficiency. 

Some examples of possible patterns of instructional 
emphasis of the same skill structure over segments are 
described In Table Note that patterns do occur where 
Instruction Is provided on skills In segments after pro- 
ficiency is expected. This instruction, however, should 
not be tested beyond the segment in which skill proficiency 
is expected. > 

3. Segment subscores/outcomes . Within a given Instructional^ 
segment, it is Unusual for more than a single score to b< 
required to adequately measure Instructional attainment 
(Hanson, 1980). However, it is often desirable to have /two 
or three outcomes to adequately describe the Instructional 
outcome attainments. The notion explicit In this statement 
Is that a primary purpose of an "outcome area" Is to 
provide a description of a segment of Instruction at 
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TABLE k 

■ / 

FIVE ILLUSTRATIVE INSTRUCTIONAL PATTERNS OCCURRING 
ACROSS FOUR PROGRAM SEGMENTS FOR AN OUTCOME 



Pattern Description 

1. N N I ^ N Instruction only In segment 3 with 

proficiency expected after segment 3. 

2. I N I 4 N Instruction in segments 1 and 3 with 

, proficiency expected after segment 3. 

3- I N I N Instruction in segments 1 and 3. 

Proficiency not expected. 

4. I ^ N I I Instruction in segments 1 , 3 and k. 

Proficiency expected after segment 1. 
> i 

5. I I ^ I I Instruction in segments I, 2, 3 and k. 

Proficiency expected after segment 2^ 



, 

Legend 


1 - 


\ 7 

Instruction given in 
segment^ 


N - 


Instruction not given 
in segment 


A " 


Pointer marking 
when proficiency is 
expected and testing 
would take place 
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level of detail that makes It understandable to persons not 
Intimately acquainted with the Instructional program ( |.e. , 
those not delivering day-to-day Instruction, such as 
administrators, parents, school board members). The 
highest level of generality that allows for meaningful 
effects to manifest themselves Is soug* ... This usually 
results In three or fewer scores per segment. 

The fact that an outcome area designation may be 
popular Jargon, e.g., reading comprehension, math problem 
solving, does not mean that the resulting attainment tests 
can be readily compared, to other tests with subscores 
referencing the same categories, e.g., standardized tests. 
The defined structure Is applicable In a particularized 
form to an Instructional program. Another way of 
Illustrating this point Is by considering the tests 
produced via this method for two Instructional programs. 
While they might conceivably share common outcome via 
descriptors, It would be unlikely that the tests would 
.. produce comparable results In use with groups of students 
receiving Instruction In either program and, In fact, have 
been shown not to (Hanson & Bailey, 1980). This Is because 
the essence of their respective scope and nature Is 
contained In the distinctions between their respective test 
specifications (e.g., the form of the questions, lexicon 
used, and allowable syntactical structures). Such 
differences are usually only detectable when test 
specifications are carefully prepared and empirically 
tested (Hanson & Bailey, 1980). Often they cannot be 
detected even when comparing two different test forms to 
one another. The point Is, that segment struc- 
tures do not typically have and should not be Interpreted 
as having common Interpretabl 1 1 ty simply on the basis of 
their titles. 

r 

l». Resolve the "number of items" question. One of the most 
Important features of attainment tests are the economy In 
testing time they afford over other types of achievement 
tests. Consistent with good general measurement 
procedures, multiple independent observations (to be 
referred to as items) are required for each outcome area 
(or single segment score) of an- attainment test. However, 
the number of such observat i ops or items required to 
provide the level of accuracy for the uses of attainment 
tests are considerably less than might be expected. 
Experience with such tests suggests that 30 items Is the 
absolute maximum number required. This guideline assumes 
that student level score interpretation will center around 
distinctions In minimal proficiency (typically less than 
60S), preliminary proficiency (typically 60S to 80S), and 
consolidated proficiency (typically 80S or more). Also, In 



9 

ERIC 



17 

* 



13 



determining the exact number of Items, the type of format 
employed (true-false, multiple choice, sentence completion, 
essay), the number of elements referenced, and more 
generally, the extent to which individual Items 
discriminate between students) and student groups receiving 
different amounts of instruction are important. The latter 
aspect Is important because it refers to the information 
yield of an Item. Where information yield is high, 
relatively few Items are typically required. For example, 
an attainment test score may be based on as few as three or 
four items and function effectively. j 

v 5. Sampling of Institutional skills/concepts to be tested . A 

\ strategy for sampling based on'the relative Instructional 

\ emphasis given to element clusters of a segment or outcome 

\ area of a segment must be devised. Frequency counts of 

amounts of practice provided derived from the instructional 
specifications can be treated, tike "weights 11 showing the 
relative importance of the various clusters. The sampling 
across strata is then determined by the amount of practice 
given In the instruction. 

The specifications in Table 5 1 1 lustrate the results of steps 1 
to 5 for the same segment (Block 8 of the SWRL/GInn Reading Program) 
referred to In the earlier discussion of instructional 
specifications (see Table 3). The. two specific outcomes, i.e., Word 
and Sentence Meaning and Paragraph and Text Interpretation, refer to 
different but complementary aspects of the Instruction* While they 
are loosely related in that one would expect students doing well on 
Paragraph and Text Interpretation to do well on Word and Sentence 
Meaning (but not vice versa), they were differentiated In 
instruction by different kinds of practice. More importantly, they 
represent language skill areas that are often differentiated In 
reading tests. Thus, in spite of the fact that one skill area might 
be subsumed under the other, they were treated as separate outcome 
areas for purpose of attainment testing. 

A noteworthy distinction between test specifications (Table 5) 
and instructional specifications (Table 3) Is the sharply increased 
level of specificity required for the former. Instructional 
specifications can be (and are) more general than those for a test 
since not everything presented in Instruction is taught and not 
everything taught is tested. However, the opposite condition must 
hold, I.e., everything tested must be taught. This Is what the 
additional constraints of the test specifications are Resigned to 
ensure. The general guideline for these specifications is referred 
to as the "least common denominator 11 approach. It requires 
everything defined by the test specifications be clearly taught In 
the Instruction, but not that everything taught be encompassed by 
the specifications. 
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TABLE 5: SAMPLE ATTAINMENT TEST BOUNDARIES 
SWRL/GInn Reading Programs Block 8 



1 



Word and 

Sentence 
Meaning 
(30 Items) 



Outcome 



Item 
or mat 



Sentence 
Completion; 
Multiple 
Choice\ 



Stimulus 
Characteristics 



1 . Item stem 
shouid In- 
clude only 
words taught 
prior to 
this block, 

2. Sentence . 
length should 
be about 7 
words. 

3. Syntax value 
should be 
about 2 or 3. 

k. Sentence 
should not 
discriminate 
against sub- 
groups such 
as black 
dialect. 



Dlstractor 
Characteristics 



1 . All dlstrac- 
tractors 
should be 
new words 
taught tn 
this block. 

2. Distractors 
should b# 
clearly 
wrong, not 
based on 
shades of 
mean i ng • 

3. Distractors 
should be 
the same 
part of 
speech as 
the answer. 



. Content 
Parameters 



Storybook 
Words 
(weighted 
twice as 
much as 
non-story- 
book yords) 



Non-Stofcy- 
book WorqSi 
(weightedN 
proportion 
ally to 
frequency) 



New Words 
that use 
the same . 
decodinq 
skills 



Samp) tng 



Unit 

2 

3 
k 

Total 



Items 



5 
5 
k 
5 
•19 




Paragraph 
and Text 
Interpretation 
(14 items) 



Passage 
Fol lowed 
by Multiple 
Choice 
Questions 



1. Question 
should not 
be answer-* 
"able without 
reading the 
passage ♦ 

2. Passage 
should have 
an average 
syntax value 
of about 2.5* 

3. Passage must 
meet the 
specific 
characteris- 
tics for the 
type of Item 
(literal, 
concept 
Ident If tca~2 
tlon, etc.) 



1 



5- 



Usually one 
dlstractor 
of each of 
these types: 

a) parti ally 
Incorrect 

b) opposite 

c) plausible 
in real ity 
but unre- 
lated to 
text 

All distrac- 
tors must be 
plausible. 
Distractors 
for one ques 
tlon should 
not provide 
clues to an- 
other question 
Do not use 
"story do^sri 1 
say. n 
Do not require 
fine level 
discrimina- 
tions. 



Literal 
Questions 



Concept 
Identifica- 
tion in the 
Question 



Concept 
Identifica- 
tion \p the 
Answer 



Title/Main 
dea 



Purpose 



Information In this table is taken from Final Block Assessments for EJgne nta^ 
CSP. a dS. under Task 1.5.2. of N.I.E. Contract *>. NE-C-OO-J-OuW, SWRL 
Educational Research and Development, Los Alamttos, CA, May 1977. 
ERJC 2 s P ectHc requirements of these types of Items are Included In the full domain 
™"™ boundaries but are not reprinted here. |<j 
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An example to illustrate this point can be seen In the first 
stimulus characteristic for Word and Sentence Meaning In Table 5. 
It states that item questions should include only words taught in 
earlier segments (i.e., Blocks) of instruction. In the actual 
Instruction on this block, some words in Item questions from the 
current segment were used. However, use of ,t.he current block words 
in Items would directly confound attainment measurement of the Word 
and Sentence outcome area since each item would not measure the 
meaning of new words in isolation. Thus, the test specifications 
are more restrictive than those actual ly used In instruction. 

A variation of "least common denominator" approach Is applied 
to the second stimulus characteristic in Table 5. It states that 
the length of the stimulus (stem question) should be about seven 
words. This* length is the median value found in the instructional 
materials. 

The specifications in Table 5 also indicate the pedagogical 
categories arid sampling to be carried out for the test. The 
pedagogical categories for the Word and Sentence Meaning outcome 
Include three different categories of words taught In the segment. 
These are enumerated during the Instructional analysis when the 
relative amount of , practice given to the three kinds of words is 
specified. Only those words practiced enough to be taught are 
Included. These words are then sampled by strata to, produce tlje ^ 
final set of concepts to be tested for the segment. ^ 

k 

Test Verification 

The boundaries provided\ln the Test Specifications phase are 
fully sufficient as a basis for generating prototypical Items for 
each strata of a segment outcome. A set of these Items, typically 
larger than the number of Items actually used for the test, Is 
prepared and distributed to instructional program participants for 
tryout. Often several forms of a prototype test are prepared to 
obtain data on items. 

The purpose of the tryout is to obtain Item data, from 
student/classes completing various portions of each Instructional 
segment. The data so gathered are used to revise test specifica- 
tions and to select the specific items to be Includejti In the 
completed attainment tost. These data are also used In selecting 
Items for the placement test which will be discussed In the next 
section. 

The item data are used In Several ways In attainment test 
construction. The first use Is to ascertain the Instructional sen- 
sitivity of Items. Because of the nature of the segment 
definitions, there should be a clear pattern of Increasing Item pro 
flclency by pupils completing more segments of Instruction. This 
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statement applies both to items and to compos ttes, of them forming 
the outcome area and segment scores* v Further , these results should 
hold across all units of analysis, I.e., students, classes, schools, 
and districts* Any exceptions to this pattern are reasons for 
careful examination of both the instructional specifications and the 
outcome area boundaries. / 

Some examples of the kind of results these kind of data provide 
are given in Table 6. the table shows results for items displaying 
six different patterns (labeled a to,£) of proficiency chaftg* across 
instructional completion quart lies. The quart lies correspond to the 
division of actual cl/fcss level data into fo'jr groups based on the 
amount of Instruction completed during a school year. Note that 
patterns a and b show the desired profile of regularly increasing 
proficiency with increases In Instructional* completion. On the 
other hand, patterns c and d show profiles that do hot follow the 
expected pattern/ PaTtern c shows essentially M no change 11 across 
quart lies and pattern d of alternately Increasing-decreasing 
proficiency. / ; ; 

' / ■ r" ; 

The data" presented in Table 6 for patterns e and £ respectively 
provide examples of undesirably high pre- Instruct lonal proficiency 
and undesirably low post-instructional proficiency. Patterns a and 
b both show desired pre- and post-instructional proficiency levels 
for Items/. 

Some of the major reasons an item may not perform as expected 
are lifted below and may be used to guide revisions. 

1. Technical flaws . These might be due to unclear directions, 
misleading foils, and misinterpretation of question. 

■ * 

2. inappropriate assignment to segment ♦ This Involves faulty 
indexing to an instructional segment so that students 
either learn it earlier (high proficiency pre and post) or 
later In the program sequence (proficiency is too low for 
students completing the instruction). 

3. inappropriate pedagogical referents . The item requires 
skills/concepts not provided in the Instruction. When this 
happens, the specifications usually need to be revised* 

Preparation of an attainment test uses the verification data as 
the basis for identifying items in a quality indicated by the test 
specifications. The attainment test for each segment will thus be 
composed of items that are. sensitive to instruction in proportion to 
their emphasis in instruction* 



p 

21 



\ 



TABLE 6 

Proficiency Patterns Related' to Instruction Received' 



Desirable Proficiency Patterns 



Undesirable Proficiency Patterns 



When outcome is 
taught during 
al i four 
quart i les 

WORDS 



b. 



When.. out come is 
taught during 
some quart i les 



(HYPOTHETICAL) 



c. No change in* 
proficiency 



SEQUENCE 



d, Al ternately 
increasing and 
decreasing 
prof i ciency 

WORD RECOGNITION 



e. Initial 

proficiency* 
\too high 




Complet ion Quart! le 12 3 4 
Mean % 39 57 68 75 

Hunter of Classes 2! 57 54 84 




12 3 4 
18 54 56 84 





f. Final 

proficiency 
too low 



WORD ATTACK 



12 3 4 
74 85 86 88 
21 57 54 84 




All figures on this page except M b H display proficiencies attained by ^pupils on the s out comes of various kindergarten 
reading programs, as reported by Hanson, R. A., Schutz, R. E., and Bailey, J. D., in Proj, jnr-Fair Evaluation of Instructional 
Programs: Initial Results of the Kindergarten Reading Readiness Inquiry , Technical Report 57, SWRL Educational Research 
and Development, Los "Alami tos, California, 1977, pages 33, 38, k0 9 and *i3. Figure "b 11 gives hypothetical data since none 
of the outcomes displayed thia pattern. 
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INSTRUCTION NOT. PROVIDED 
IN THIS QUARTILE 



INSTRUCTION PROVIDED 
IN THIS QUARTILE 
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PLACEMENT TEST DEVELOPMENT 

Given that attainment .tests have been prepared and verified for 
each segment of a product system, placement test development can 
take place. The essential task in preparing a placement test is to 
select items that yield information to differentiate student 
assignment to the most appropriate initial segment of instruction. 
To select these items, data must be available and used in 
conjunction with the attainment test specifications. 



Placement Test item Selection 

What is ideally sought is a small set of items per segment that 
show direct change from pre to post on Instruction, yet are 
relatively independent of Instruct Ion received in adjoining 
segments. The kind of information used for this purpose is simply 
the average proficiency of a sample of students on attainment test 
items from several segments after completing one or more 
instructional segments. Such data show how performance on an item 
changes with the completion of various Instructional segments. 

The results presented in Table 7 illustrate how such item data 
actually appear and are used. It presents the average proficiency 
of samples of students who have received instruction in various 
segments on two items (a and b). 

Item a shows the pattern of proficiency change that is sought 
in placement test items. Students not completing instruction in 
segment 3 attain low levels of pr/oflciency on this item. Those 
receiving instruction on this segment (and subsequent segments) 
attain high levels of proficiency. Items 1 Ike a wl 1 1 typically 
measure skills/concepts that ar/a relatively specific to the segment 
they reference (In this case segment 3h 

Skills/concepts, that, aria, taught across several segments 
typical 1/ show some sensitivity across several segments and hence 
are not efficient for a placement test. Item b in table 7 
illustrates how data on such an item typically appears. Proficiency 
increases gradual ly for student groups completing segments 1, 2, 3, 
and k and remains at high levels thereafter. Such items are not the 
most efficient for use on a placement test. 

To summarize, the items picked for a placement test should be 
those which the instructional specifications indicate are taught 
exclusively in a segment and the proficiency data indicate clearly 
differentiate student groups completing from those not completing 
the segment. 
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Table 7. Illustrative Data for Selecting Placement Test Items 



Average 
Proficiency 




Item 
b 



28* 
50 



26 
60 



Segments 
80 80 
70 88 



80 
88 



80 
90 



NOTE: Each value Is an average proficiency based on 25 or more students 
who have completed the Instructional segment. 
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Placement Test Assembly 

The composite placement test Is made up of Item sets 
corresponding to each segment. Typically the number of items 
required for each segment Is small, I.e., 8 to TO. Thus the full 
placement test for an instructional product system with six segments 
is typically less than 60 items. 

To Interpret fche results of placement test use, a "cutoff" for 
each set of segment items Is needed. The cutoff Is simply a single 
number guide for rule-of-thumb use by those responsible for a 
student's initial instructional program placement. To obtain the 
cutoff scores, data on the proficiency level attained on the 
placement items by student groups who completed each segment are 
used. The cutoffs are derived by simply adding up the average 
difficulties on the placement test items for each segment and 
rounding to the nearest whole number.; . .in practice, this usually 
means a student must attain 7 or $ right out, of TO In order to be 
credited for completing a segment for placement information 
purposes. 



Placement Test Verification 

Verification of the operational effectiveness of a placement 
test can be examined using either item or score level data. How- 
ever, assuming the item level data used for selecting items were 
based on reasonable-size and representative samples (e.g., at least 
several hundred students from a variety of schools), the primary 
focus In empirical verification should be on score level data. One 
kind of data which is relatively, easy to obtain are the placement 
test subscores of students corresponding to each segment and scored 
1 or 0, I.e., pass/fail, based on a designated cutoff score. 

Verification using such data focuses on answering the following 
question: Are the placement patterns observed consistent with the 
structure of the Instructional materials? The typical expectation 
Is that student placement patterns will display a form of Guttman 
scale, I.e., patterns should not show reversals. For example, If a 
student exceeds the cutoff score on segment 3, the student should 
also exceed this level on segments I and 2. 

An example of such verification data Is given In Table 8. It 
is based on data from over 8,000 students from several districts on 
a placement test referenced to a reading program with eight 
segments. The data show that the expected placement patterns were 
observed overall for about 90* of the 8,208 pupils receiving the 
test. 
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_ TABLE 8 

Summary of Placement Patterns 

Number Number of Percent 
Expected Patterns placed In / Students with of 
Segment. 12 3 4 5 6 7 8 segment Expected Pattern Reversals 



1 










1,350 


1 ,248 


8 




+ 








1,062 


937 


12 




+ . 


+ 






658 


574 


13 




+ 


+ 


+ 




629 


598 


5 






+ 


+ 


+ - , 


1,052 


885 


16 




+ 


+ 


+ 


+ + - 


613 


522 


15 




+ 


+ 


+ 


+ + + - 


697 


611 


12 


8 


+ 


+ 


+ 


+ + + + - 


761 


761 





TOTALS 8,208 7,522 10 



+ indicates above cutoff for the segment 
- Indicates below cutoff for the segment 
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SUMMARY 

This paper discussed a framework for achievement testing in 
instructional programs that have identifiable intentions and 
resources. The framework entails three kinds of testst placement, 
progress, and 0 an attainment. A precise method lor designing and 
developing the Instruments was/then presented. The methodology is 
designed to ensure that the test instruments and results serve 
carefully defined functions and accurately describe afSd reflect 
instructional program effects. As such, the specific concepts and 
skills addressed and the emphasis they receive in the instructional 
materials and procedures provide the basis for defining the test and 
reporting structure. 

The central element in this framework is the attainment test 
and the key design feature of this Instrument is the program 
segment. A program segment is somewhat akin to a well defined 
"domain" in criterion-referenced testing (e.g., Mlllman, 197M. 
However, unlike domain- referenced tests, the segment attainment test 
will likely include range of concepts and skills that would be 
regarded as heterogeneous from a domain- referenced test perspective. 
The logic for including such items ♦hlri the same test (and perhaps 
the same score) resides in the artr't ture of the Instruction and 
reporting Information rather than • « domain logic/ The major 
Issue In determining whether multiple scores are appropriate In an 
attainment test is the diversity present in terms of the 
instructional formats used and the relative ease with which the 
attainments can be described to audiences outside the classroom. 
These concerns often converge in practice, i.e., instructional 
structures that use different formats usual ly require multiple 
scores to describe the effects. 

The method described in the paper is clearly appropriate In 
connection with any Instructional product system used in a formal 
schooling instructional program. Preliminary results indicate the 
methodology is extendable to a broad range of instructional programs 
and product systems (Hanson S Bailey, I960). 

V 

J 



28 



\ 



23 



References 



Behr, G., & Hanson, R. A. Differential access to instruction; a 
source of educational inequality. Paper presented at AliRA 
Annual Meeting, New York, New York, April, 1977. - I .. 

- -it 

Buros, 0. K. Fifty years In testing: Some reminiscences. 

criticism, and suggestions. Educational Researcher , 1977. 
6(7), 9-15. T ~ 7 

Buros, 0. K. (Ed.) The Eighth Mental Measurements Yearbook . 
Highland Park, NJl Gryphon Press, 1978. 

Follettle, J. F. Task analysis and synthesis as precursors of 
productive Instruction. SWRL Educational Research and 
Development, Los A tarn I tos, California. 

Hanson, R. A. Bringing about basic changes In education. Paper 

presented at AERA Annual Meeting, Toronto, Canada, April, 1978. 

Hanson, R. A., & Schutz, R. E. A new look at schooling effects from 
programmatic 'research and development, Making Change Happen , 
edited by D. Mann. New York, New York: Teachers College 
Press, 1978, 120-149* 

Hanson, R. A., Bailey, J. 0., & Molina, H. M. The implications of 
intra-progfam placement decisions for the understanding and 
improvement of schooling. SWRL Educational Research and 
Development, Los Alamltos, California, 1980. 

— L ' r_ ■ 

Hanson, R. A. Evaluation and planning. SWRL Educational Research 

and Development, Los Alamltos, California, 1980. 



Hanson, R. A.L Bailey, J. D. f 6 Schutz, ft. E. Program-fair;' 

evaluation of instructional programs: Initial results of the 
kindergarten reading readiness inquiry, 1977. Technical Report 
No. 57. SWRL Educational Research and Development, tos 
Alamltos, California. 

Hanson, R. A., Schutz, R. E., £ Bailey, J. 0. What makes 

achievement tests tick: Alternative instrumentation for 
instructional program evaluation. SWRL Educational Research 
and Development, Los Alamltos, California, 1980. 

Hanson, R. A., & Bailey, J. D. Program fair assessment, 

Identification of program effects and empirical curriculum 
inquiry. SWRL Educational Research and Development, Los 
Alamltos, California, 1980. 



29 



2k 



Madaus, G. F. Afrasian, P. W., & Keiloghan, T. School 

Effectiveness: A Re assessment of the Evidence * McGraw-H ill, 
New York, 1980. 

Miiiman, J. Criterion-referenced measurement:. Evaluation in 

Education: Current Applications , edited by W. James Popham, 
Berkeley, CA: HcCutchan, 1974. 

Popham, W. J. Educational Evaluation . Englewood Cliffs, New 
Jersey:' Prentice-Hall, 1975. 

Tyler, R. W. Constructing Achievement Tests . Bureau of Educational 
Research, Ohio State University, 1934. 

Wolf, R. M. Evaluation In Education . New York: Praeger 
Publishers, 1979. 



i 



30 



